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A dynamical model of an ecological community is analyzed within a "mean-field approximation" 
in which one of the species interacts with the combination of all of the other species in the community. 
Within this approximation the model may be formulated as a master equation describing a one-step 
stochastic process. The stationary distribution is obtained in closed form and is shown to reduce to a 
logseries or lognormal distribution, depending on the values that the parameters describing the model 
take on. A hyperbolic relationship between the connectance of the matrix of interspecies interactions 
and the average number of species, exists for a range of parameter values. The time evolution of 
C^h' the model at short and intermediate times is analyzed using van Kampen's approximation, which 

is valid when the number of individuals in the community is large. Good agreement with numerical 
simulations is found. The large time behavior, and the approach to the stationary state, is obtained 
by solving the equation for the generating function of the probability distribution. The analytical 
results which follow from the analysis are also in good agreement with direct simulations of the 
model. 
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I. INTRODUCTION 



The steady accumulation of data on all aspects of the very diverse ecosystems that exist on Earth has revealed a 
number of generic features M. Examples include (i) in species-rich ecosystems, the number of species, S(n), with 
n individuals following a power-law, S(n) sw n~' y ', where 7 is close to one 0,0], (ii) a relation between the number 
of species in the ecosystem, S, and the connectance, C* - defined as the number of predator-prey links between 
pairs of species divided by the total possible number of links — which has the hyperbolic form C* ~ S~ 1+n , with 
qv ' i) 6 [0,1] S, and (hi) other power-law distributions concerning the extinction of species, for instance, where the 
lifetime of species, T, appears to be well-described by the distribution N(T) rs T~ 8 , with between 1.1 and 1.6 [0. 
There is an urgent need for models of ecosystems to be developed which will allow the underlying mechanisms which 
lead to these regularities to be understood. These models need to be defined for an arbitrary number of species, have 
a set of rules specifying the interaction between pairs of species which is reasonably simple and based on general 
features such as the competition between species, and have a stochastic element to reflect the randomness of events 
which are inherent in real systems. In Ref [pi a model of this type was introduced in order to investigate the generic 
features outlined above. An analysis of the model was begun in that paper, where both numerical and analytical 
work showed predictions of the model to be in agreement with field data. Here we present a more detailed analysis 
of the model, using a variety of techniques, and compare the results of this analysis with that from real ecosystems. 



We begin by defining the model. 

The ecosystem under study is taken to have N individuals and S possible species. It is modeled as a directed 
graph with the nodes labelled by % = 1, ..., S representing the species, and the links representing the (predator-prey) 
interaction between the species at the two nodes being joined. This interaction is assumed to be given by a single 
real number, denoted by f2y for the link to j from i. Thus, the interaction between the species in the ecosystem is 
completely specified by the S x S real matrix $1. Links from a node to itself are not allowed and therefore this matrix 
has zero entries on the diagonal. The antisymmetric matrix $y = fly — Qji has a more direct interpretation as the 
"score" of species i against species j: 

• If Sij > 0, then j acts as a resource for i. 

• If Sij = 0, there is no interaction between i and j. 

• If Sij < 0, then i acts as a resource for j. 

Modelling multispecies ecosystems involving species-species interactions or connections of this type, has a long 
history M. Originally, population dynamics equations, such as the Lotka-Volterra equations, were written down for 



two species and then for many species. If one imagines studying the equations near to any fixed point that might 
exist, it is permissable to linearize about the fixed points, and the entire model is then specified by a single S x S 
matrix — the stability matrix. Whereas, for systems involving two species it might be useful to calculate this matrix 
in terms of the original parameters of the model, for systems of many species there are simply too many parameters 
and so the emphasis changed to trying to investigate general properties that such matrices might have. An obvious, 
but crude, assumption that the entries were random, was first investigated by May |J7) who found that the connectivity 
of the matrix was important in determining its stability properties. Since then, this has remained a central issue in 
ecology H , as has the study of the abstract theory of species connected by a complex network of interactions |J . 

Having described the basic idea we will use in our approach, we now need to specify the interaction matrix $7. 
Since, the connectivity seems to emerge as an important quantity in both theoretical and experimental studies we will 
assign a fixed conductivity C to ft, so that we may study how properties of the system change as C is varied. Other 
than this, and the fact that the diagonal entries are zero, we will not impose any other restrictions on fl. We now 
have to define the dynamics. The goal is to define a set of rules which is simple, but which builds up a complex model 
ecosystem, after a sufficiently long time, showing the non-trivial emergent behavior mentioned at the beginning of 
this section. We do this by assigning the (off-diagonal) entries of fl, in a purely random way at t = 0, and updating 
the system at discrete time steps as follows. At each time step: 

1. With probability (1 — /i), pick two individuals at random. Suppose they belong to species i and j and that 
Sij ^ 0. Replace the individual belonging to the species which has a negative score against the other species 
by a new individual of the more successful species. So, for example, if Sij > 0, the total number of individuals 
belonging to species i goes up by one, and the total belonging to species j goes down by one. If Sij — 0, no 
action is taken. 

2. With probability fj,, pick an individual at random. Replace it by another individual of any of the S species. 

These rules have an obvious interpretation. The first simply ensures that the most successful species, in the sense of 
the ones having the highest scores, grow at the expense of the less successful ones. However, if the dynamics consisted 
only of this rule, then eventually all species but one would go extinct. Therefore, a second rule has to be introduced 
in order to obtain a diverse ecosystem. The simplest choice is to violate the first rule occasionally, by giving even 
unsuccessful species an opportunity through purely random events. This is best not thought of as a mutation or 
speciation, but as an immigration event from an area outside the ecosystem under study. 

We have not specified the initial distribution of the entries in ft and there is a certain amount of freedom regarding 
this choice. In our simulations we have chosen the fiy (i ^ j) randomly from a uniform distribution on [0, 1], but 
any other choice is equally valid, since only the sign of Sij is important. Since the probability that fly = flji ^ is 
vanishingly small, it is almost certain that if the condition Sij — mentioned in rule 1 holds, then both O^ and fljt 
are zero, and species i and j are not connected by a predator-prey relationship. Since the probability of any matrix 
element being zero is 1 — C, the probability that both fiy and ilji are zero is (1 — C) 2 , and from what has been said 
above, the probability that Sij is non-zero is C* = 1 — (1 — C) 2 . 

For those simulations that start with no species in the system, a generalized (S + 1) x (S + 1) real matrix $7' with 
an extra row and column denoted by needs to be introduced. Then, if Sio > 0, empty space acts as a resource for 
i. If Sio < 0, then species i fails to invade empty space. So, on average, the expected number of species that can 
actually invade empty space is SC/2 and the number of species in the pool that can never interact with empty space 
is (1 - C)S. 

What has been described above is a strongly interacting, stochastic multispecies model and as such is extremely 
difficult to study analytically. It is, however, relatively easy to simulate, given the straightforward nature of the 
algorithm described above, and we will discuss the results of extensive numerical studies in later sections of this 
paper. It would still be very useful to have some approximate treatment available which would, at the very least, 
help to suggest forms which could be use to fit the data and simulations. Fortunately, we can do much better than 
this. A mean field theory of the above model yields a master equation which can be analysed using a number of 
standard techniques. Much of the paper will be concerned with the derivation of these results and their subsequent 
interpretation. 

The plan of the paper is as follows. In section II we derive the master equation within the mean field approximation 
and in section III we investigate the nature of the stationary state. The time-dependent properties are the subject 
of the next two sections: within a Gaussian approximation in section IV and a more general study in section V. We 
conclude with a summary of the work presented in the paper in section VI. There are two appendices: Appendix A and 
Appendix B contain technical details which are used to derive some of the results in sections III and V respectively. 



II. MASTER EQUATION 

In this section we will derive a master equation which approximately describes the complex dynamics introduced 
in section I. The key simplification is the use of a type of mean field theory. We focus on one of the S species, which 
we shall call species A. The other (S — 1) species are no longer distinguished as separate species and are simply 
lumped together and denoted as species B. The B species will be regarded as some kind of average species — a kind 
of effective background population with which species A interacts. There are various assumptions inherent in this 
approach. For instance, that the rate of reproduction is the same for all species, so that a typical species (A) can 
be picked out as representative. It does however, reduce the model to one in which just two species are interacting, 
namely A and non-A = B. It is now relatively straightforward to derive a master equation which describes the 
dynamics of this process. 

To derive this equation, first suppose that u = for simplicity. Then only rule 1 is in operation. In picking two 
individuals from a set of N individuals of the S species, the following situations arise: (a) both individuals belong to 
species A, (b) one belongs to A and the other to B and (c) both individuals belong to species B. In cases (a) and (c) 
there is no action taken under rule 1. The probability of case (b) occurring is the sum of the probability that first an 
A is selected and then a B and the probability that a B is selected and then an A: 

n ( n — 1 \ / n \ / n \ In ( N 
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where n is the number of individuals of species A in the ecosystem. We now have to focus on the quantity Sab in 
order to implement the rule. The probability that it is non-zero is C* and we would expect that, on average, half of 
the events the individual from species A will have a higher score than the individual from species B i.e. Sab < and 
the other half of the events to have Sab < 0. Therefore, the probability that at each time step the number of species 
A increases by one is 

n ( N - 
W(n + l\n) = C*-( w _ ] 

and the probability that at each time step the number of species A decreases by one is 

W(n-l\n) = C* U /A ""' 
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We will show shortly that this process leads to a stationary probability distribution which is non-zero only if n = 
or n — N, that is, only if either species A dies out completely or dominates completely. The second rule ensures that 
some diversity is retained. 

So suppose that we now include the second rule. The transition probabilities above have now to be multiplied by 
(1 — u) and those involving the second rule will involve a factor \i. Specifically, the probability that the individual 
picked, when implementing the second rule, is replaced by an individual of a given species is n/ S. If we ask that 
this given species is A, then since the probability that the individual picked belongs to species B is (1 — n/N), the 
additional probability due to rule 2 that at each time step the number of species A is increased by one is 

S V N 

Similarly, the probability that an individual is replaced by an individual of a different species is /j,(S — 1)/S. Since 
the probability that the individual picked belongs to species A is n/N, the additional probability that at each time 
step the number of species A is decreased by one is 

^(S-D- 
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Putting the two rules together, gives the one-step transition probabilities g n = W(n + l|n) and r„ = W(n — l|n) as 

^y*/i \ n ( N - n\ u n 

Q n = C (1 - u)— + —(1 ) , (1) 
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We can now write down a master equation describing this one-step stochastic process |10|,|llj]. If P(n,t) is the 
probability of species A having n individuals at time t, the master equation takes the form 

£ = r n +iP{n +l,t) + g n -iP{n - 1, t) 

-(r n +g n )P(n,t). (3) 

Equation (]3j) is only valid for values of n not on the boundary (i.e. for n^O and n ^ N); for these values special 
equations have to be written reflecting the fact that no transitions out of the region [0, N] are allowed. However, from 
(|l|) and (g) we see that gw — and tq — 0, and if additionally we define r^+i = and g_i = 0, then (J3J) holds for all 
n = 0, 1, ..., N. To completely specify the system we also need to give an initial condition, which will typically have 
the form P(n, 0) = 5 nm for some non-negative integer m. 

We will end this section by determining the stationary probability distribution, P s (n). Setting dP(n)/dt — 0, one 
obtains 

r n +\P s {n + 1) - g n P s {n) = r n P s (n) - g n -iP s (n - 1) . (4) 

This is true for all n, which implies that r n P s {n) — g n -iP s (n — 1) = J, where J is a constant. Applying the boundary 
condition at n = 0, we find that J — and therefore 

r n P s {n) = g n -iP s {n - 1) ; n = 0, 1, ...,JV. (5) 

If /i y^ 0, then r n ^ for all n such that < n < N, and therefore 

P s (n) = 9n-i9n-2...go Ps{Q) n= (g) 

r„r„_i...ri 

The constant -Ps(O) can be determined from the normalization condition 

JV 

^P s (n)^P s (0) + ^P s (n) = l : (7) 

n—0 n>0 

N 

(p s (o))- i = i + y g "- lg "- 2 - g0 . (8) 
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At this point it is convenient to introduce a set of combinations of the constants of the model which will appear 
frequently in the analysis. These are: 

H* = n/[(l - n)SC*] , \*=v*(N-l) 

and v* = N + n*(N-l){S-l). (9) 

The transition probabilities (|l|) and (|2|) may now be written in the more compact form 

g n = < ^^Zf ) (N-n)(Y+n) and 

_c*(i-p) n(i/ ,_ n)> (10) 



N{N- 1) 



Substituting © into (|) gives 



AT 



(Pa^r^x; 



n=0 

AT 



A^\ r(n + A*)r(V* -n) 



ny r(A*) r(j/*) 

N\. 1 , n r(n + A*) r(l-/, i 



n—0 v 7 ' 



This sum takes the form of a Jacobi polynomial P]^* (x) |Q, with a = — v* , [3 = A* + v* ~ (N + 1) and x = — 1, 
which can itself be expressed in terms of gamma functions for this value of x. So using (o) we find 

_ /JV\ r(n + A*) I>* - n) r(A* + v* - N) 

sW ~\n) r(A*) i>*-7V) r(A* + ^*) ■ l j 

In various intermediate expressions we have assumed that v* is not an integer, but this final result is well defined for 
all meaningful ranges of the parameters, since from (g) we can see that v* > N and A* > 0. Moreover P s (n) > for 
all n = 0, 1, ..., N and J2 n P s (n) — I by construction. By introducing the beta function (3(p, q) — T(p)T(q)/T(p + q), 
the stationary, normalized solution can be written in the more compact form 
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Finally, if \x = 0, tn = 0, so (^ no longer holds for n = N . Using the result that go = in this case, one finds 
that P s (n) = for n = 1, ...,JV — 1. By normalization we can write P s (0) = C and P S (N) = 1 — C, where C is a 
constant. So, as mentioned earlier, either species A is the only surviving species or it goes extinct. In other words, in 
the stationary state only one species survives. Since all species are assumed identical, it follows that when p, = 

P.(P) = ~ ; P S {N) = 1 - i ; P s (n) = for < n< N. (14) 

Although we have obtained the exact solution for the stationary distribution ( |12| ) in terms of nothing more complicated 
than gamma functions, we still need to simplify it if we are to compare the result with data. In the next section, 
we will derive simpler forms for the stationary probability distribution which are valid in different regions of the 
parameter space of the model, and compare these with simulations. 

III. STATIONARY STATE 

We have so far been discussing the stationary state from the point of view of a time-independent solution of 
the master equation. But let us now ask the question in a biological context: are ecological communities in stable 
equilibria? Although is obvious that environmental variability and chance have a great impact on ecosystems, some 
well-defined, time-independent, patterns arise when natural ecosystems are observed. The model we have introduced 
reaches a well-established dynamic stationary state which allows us to study some of these patterns. A particular 
example of interest is the way that individuals are distributed among species. In any island where colonization from 
the mainland and local extinction take place, a dynamical equilibrium between these two processes is reached |l3|| . In 
these situations our model applies and can help to understand the patterns observed. 

Highly diverse ecological communities are formed when a large number of different species are present. The esti- 
mation and characterization of such biological diversity is not only a central issue in theoretical ecology, but also a 
question of practical concern for nature reserve design and conservation biology in general. In any ecological commu- 
nity species vary considerably in the number of individuals that belong to that species. Some species are very difficult 
to find because they are very rare. Some of them are extremely common. How are individuals distributed among 
species? What factors affect this distribution? The classic way of studying this topic is by means of species abundance 
relations — the "relations between abundance and the number of species possessing that abundance" p3 . Different 
types of species abundance relations have been used to fit to real species abundance data. Some of them have been 
justified on theoretical grounds (see Hj for a review). One of the most widely used species abundance distribution was 
first discussed by Fisher, Corbet and Williams in 1943 jlfj]. The distribution is defined by two parameters x and a: 

rvT n 

S(n) = , (15) 

n 

where S(n) is the number of species having n individuals. Since ( jig ) summed over n gives a logarithm, this is known 
as the logseries distribution. It is very common as a sampling distribution in the ecological literature, although it has 
also been derived on theoretical grounds Jl6|,|l7j • 

The abundance distribution that has received more attention from ecologists, however, was introduced by Preston 



in one of the most influential papers on ecological theory 18 19j. As May remarks "theory and observation points to 



its ubiquity once S 3> 1, when relative abundances must be governed by the conjunction of a variety of independent 
factors" E3] . The distribution is the lognormal distribution, so called because the logarithm of species abundances is 
normally distributed: 



S(R) = S(n )e X p{~), (16) 

where, following Preston's definitions, R — log 2 (n/no) is a logarithmic measure of the abundance in relation to no 
— the abundance value where the distribution has its maximum. So, S(R)dR is the number of species having their 
logarithmic relative abundance between R and R + dR. Notice that both equations ( J15| ) and ( [lq ) must be divided by 
the total number of species to be properly understood as estimations of the probability distribution function. 

In this section we want to compare the exact result for P s (n) with these two distributions — the most widely used 
abundance distributions in the ecological arena. We will derive simpler forms for the stationary probability distribution 
which are valid in different regions of the parameter space of the model, and compare these with simulations of the 
original model (that is, without making the mean-field approximation). Lognormal and logseries distributions will 
naturally emerge for different well-defined immigration regimes. In a forthcoming paper we will analyse a large 
quantity of species abundance data from different ecological communities in detail. 

We begin by discussing one situation in which the logseries distribution occurs. It turns out to be convenient to 
rewrite the result ([12|) for P s (n) by breaking it down into three separate parts: 
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We will now give a simpler form for each of these expressions, being careful to state the range of validity of our 
approximations in each case. Details of the derivation of these results is given in Appendix A. The non-trivial 
behavior occurs for relatively small values of n, so in what follows we will only be interested in values of n up to n max , 
where rt max <C N and N 3> 1. We will also suppose that there are many possible species: S>1. 



From Eqns. ( |A5| ) and ( |A3| ) we have that 



^i(n) « — ; n > 1 ; A* < — ^ — (19) 

n m n mav 



and 



F 3 (n,N)nexp(-n\*S/N) ; A* S < N ; A*S « ( -^- J . (20) 

Therefore ©-© give 

P s (n) = ICn- 1 cxp(-rVS') , (21) 

where /C = A*F s (0). This is the logseries (JTsj) with x = e - ^* 5 ' and expressed as the fraction of species represented 
by n individuals in the steady state. Note that (p~5|), by contrast, gives the absolute number of species with a given 
abundance n. 

Since S < N, the condition X*S <C ^V is redundant when the stronger condition A* -C l/lnn max is imposed. 
Therefore, (Elh holds when 

N 1 

1 < n < n max < and A* < . (22) 

\/X*S mn max 



To find an approximate form for /C, we use (A?) which gives an approximate form for P s (0) = J-2(N). Under the very 



reasonable conditions that A* is much less than N/S, yN and S, but with A* 5* > 1, we find 

K » A* {n*S) x " . (23) 



Fig [j] shows the results of different simulations which have been performed for increasing values of the immigration 
parameter. In order to calculate the species relative abundance distribution an ensemble average has been performed. 
For each plot a collection of 2000 replicas has been simulated. For each replica the probability distribution P(n, t) 
has been calculated after 500000 simulation time steps. In Fig |], the A* values increase from 0.033 when fi = 0.001 to 
3.3 when /i = 0.1. The last three plots do not show such a good match with the logseries approximation as the first 
three. Even in the upper three plots, where there appears to be a good fit with the logseries, we would only expect a 
complete match for 1 < n < 10. From (p2|), we should bear in mind that this is only expected to be true as long as 
A* < l/lnn max . For instance, A* = 0.22 for [i = 10~ 2 (for the parameter set N = 5000, S = 300 and C = 0.5) and 
l/lnn max is 0.434 when n max = 10. 
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FIG. 1. Stationary probability distribution P a (n) obtained from the exact solution (113) (solid line) and the logseries 
approximation (Gil) (dotted line), with the simplest form for the multiplicative factor /C J23|), for different immigration values. 
In each plot the species relative abundance distribution resulting from the individual based model is also shown (noisy solid 
line) . 

In Fig two simulation results are displayed. The stationary solution is also shown in both cases for comparison 
purposes. The stationary solution is calculated numerically either by direct application of equation (|1J), as has been 
done in Fig |lj, or by means of an algorithm that can find the stationary probability distribution of any one-step 
stochastic process if it exists, as in Fig |[ This algorithm is based on the subroutine TRIDAG (g0|. To describe it, 
we first write the master equation (H) in the more general form: 



dP(n,t) 

dt 



= Y. Wn,n' P(n',t)- Y, Wn',nP(n,t), 



where r„ = W n -i, n and g n = W n +i ]Ti . If we now introduce W n ,n' = (1 - ^n,n') W n 
vector P{t) = (P(l, t), . . . , P(N, £)), (EJ) may be written in the matrix form 



(24) 

Sn,n> I2n"^nWn",n and the 
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—P = W P. 

dt 



(25) 



Finding the stationary stationary distribution P s (n) — the vector P s = (P S (Q), . . . ,P S (N)) — is then equivalent to 
solving a system of N + 1 linear equations W ■ P s = in N + 1 unknowns: P s (n), n = 0, . . . , N. In any one-step 
stochastic process the matrix W is tridiagonal. Our algorithm takes advantage of this feature to solve the system. 
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FIG. 2. Abundance distributions for two individual-based simulations are presented for two extreme values of the im- 
migration parameter. Time is measured in simulation time units. The stationary distribution P s (n) is also shown in both 
cases. 

Another quantity which is useful in comparing model predictions to data is the average number of species in the 
stationary state, which we denote by (S). Let Pi(n) be the probability that there are n individuals of species i in the 
ecosystem. Therefore the probability that there is at least one individual of species i is 1 — -Pi(O) and so the average 
number of species is (S) = Ylifi ~ ^(0))- Within the mean field approximation Pi(0) is the same for all species i: 
Pi(0) = -Ps(O) (the subscript s denotes "stationary", as before), so that 



(S)=J2(l-Ps(0)) = (l-Ps(0))S. 



(26) 



Under the conditions X*S > 1, A* <C e and | In C* | < |ln/x|, we show in Appendix A that (see eq. ( A10 )) 

(S) - (C*r 1+e , (27) 

where e^ 1 rs | In /x| (see Fig ||). Inverting this relationship gives C* ~ (5)~ 1+? ' with r\ = e/(e — 1). The condition 
X*S <^ 1 is essentially equivalent to [iN > 1 (for a connectance that is not too small). For systems of interest iV 
is very large, and hence /i must typically be very small if the hyperbolic relation ( p7| ) is to hold. Such a tiny value 
of /i means that e and hence 77, will be close to zero. The form of the relationship between C* and (S) when the 
immigration parameter has a larger value will be discussed elsewhere. 
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FIG. 3. Species-connectivity relationship for different parameter values. The expected number of species in the system in the 
steady state is shown plotted against the connectance C* . Notice that as long as A* S remains close to but slightly greater than 
1, a very good agreement is shown between the exact form (J26J) (single line in the plot) and the hyperbolic species-connectivity 
relationship (E7|) (double line). 

In Fig |4| the species-connectivity relationship calculated from the individual modelling approach is shown. After 
carrying out 600000 simulation steps, a 1000 ensemble average was calculated for each connectivity value. The initial 
condition is the empty system. Although our mean field approximation captures the essentials of the hyperbolic-like 
behavior of the species-connectivity relationship, there is a systematic deviation from the mean field prediction in the 
simulated curves. 
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FIG. 4. Species-connectivity relationship. Two simulations are presented. Standard deviations from the ensemble average 
value are shown. In the plot on the left, X*S ~ 25 — too high a value to fit the approximation given by (p7f). So in this case 
the only plotted curve is the exact mean field relation (feq). In the plot on the right, conditions needed to apply (E7|) are quite 
well fulfilled and two curves are plotted: the exact mean field relation (solid line) and the approximation (dotted line). 



To sum up, the exact solution given in ( |13| ) admits a logseries representation for low immigration regimes. For 



these low immigration values a hyperbolic-like relation is also observed between the mean number of species in the 
stationary state and the connectivity level given by the trophic relationships pre-defined in the community matrix, CI. 
We will now argue that the exact stationary distribution probability is also very well approximated by a lognormal 
distribution for intermediate to high immigration regimes as shown in Figs ^| and [2| 

The idea behind the analysis we will present, is to find at which values of n, if any, P s (n) has a maximum. We then 
expand P s (n) about this maximum to see to what extent this function can be analytically described by a lognormal 
distribution. 

First of all, from Fig [5| we can see that P s {n) may admit a Gaussian representation for some parameter values. 
So let us look at this case first, before discussing the lognormal. To investigate for what parameter values this may 
occur, we first find the position of the maximum of P s (n). It is more convenient to consider lnP s (n) rather than 
P s (ri). From (Q, we get: 

dlnPs(n) = -V(n + 1) + tl>(n + A*) - i>{v* -n)+MN-n + l), (28) 

an 

where 

d\nT(z) 



j,(z) = 



dz 



Setting (|28|) equal to zero gives the maximum value of P a {n) at n = h. If all arguments of the psi- functions can be 
considered to be large enough, which is true if n <C N but reasonably large (e.g. n > 100), these functions can be 
approximated using ip(z) ~ lnz fl^j . So, 

dln f s (") ~ _ i n ( n + i) + i n ( n + A*) - ln(i/* - n) + ln(N - n + 1) = 0, (29) 

an 

from which one finds that the maximum is given by: 

^_ (iV + 2)(A*-l) n 



A*5-2 



(30) 



From ( |30| ) we can see that if A* S < 2 (very low immigration regimes), the numerator and denominator are both 
negative and a maximum exists. However it is inadmissible, since it violates the condition h <IC N. Therefore, a 
necessary condition for the existence of a maximum, n, is that A* > 1. 

Now, we perform a Taylor expansion of (|l2j) about n to quadratic order. If n = n + Sn, then for 5n small: 



In P s {n) = In P s (n) + 



1 d 2 lnP s (n) 



2 dn 2 



(n - nf + 0{5nf 



Since n is a maximum, d 2 In P s (n) /dn 2 \ n= n < 0, and so we set this equal to — 1/er 2 . Then ignoring the 0(Sn) 3 terms 
and exponentiating gives 

P s (n) = P s (n)exp(-^^). (31) 

Under this approximation (n <C N, but reasonably large) it is not very difficult to derive an analytical expression for 
the variance: 



_ ^/n 2 + (X* + l)h + X* 
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FIG. 5. Lognormal approximation (the long-dashed curves) for the exact stationary solution ( |l3[ ) (solid line curves). The 
parameter values that have been used are N = 100000, S = 300 and C = 0.5. The Gaussian approximation (dotted curves) is 
also shown. 



Since the Gaussian distribution is completely specified by its first two cumulants, fixing n and a give n by ( pC 
and (|3^) respectively, determines the entire curve. The dotted lines in Fig [| show this curve i.e. ( |3l| ) with the 
two parameters fixed by (30) and (j32|). The upper two plots show very good agreement with the exact mean field 
approximation; in the lower two plots the agreement is not so good. 

To approximate the exact stationary solution as a lognormal distribution (pit), we will proceed in a similar way. 
Equation ( |l6|) can also be written dividing by the total number of species as: 



IP 

2p 2 



V(R)=Mexp(- — )=Mexp[- 



(Inn — In no)' 
2^ 



(33) 



where a = pin 2 and where J\f is a normalization constant to be determined. So let us express the solution (p2J) as a 
function of Inn instead of n. After this change of variable a new, equivalent, probability distribution function arises, 
V(x), where x = Inn, that has to satisfy P s (n)dn = V s (x)dx, or, in other words, 



which implies that 



dlnV s (x) 
d.r 



V s {x) = -^ P s {n) = nP s (n) , 
dx 



1 + e x [-ip(e x + 1) + ip{e x + A*) - VK - e x ) + ip(N - e x + 1) 



(34) 



(35) 



Setting equation ( pq ) equal to zero, and using x = Inn, we obtain the position of maximum by finding the zero, n^, 
of the equation 



[ip(n + 1) - ip(n + A*) + i/j(v* - n) - ip(N - n + 1)] 



1 
n 



In exactly the same way as for the Gaussian case, we can write the Taylor expansion up to second order: 



or, equivalently 



lnV s (x) = lnT^o) - ^-j (% ~ x o) 2 



v s (x)^v s ( X o)cM- {lnn ^r o)2 ) 
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where xq = In no- Finally, using equation (p4), we get a lognormal expression for the mean field solution: 



K , (Inn — In no) 2 

— exp —j 

n 2a z 



Ps(n) = ^eM- ^\;r 0J ) (36) 



where K — n a P s (rL a ). 

The evaluation of the second derivative at x = x$ allows us to fix a value for the variance a 2 : 

4 = 1- (no) 2 [-/(n + 1) + ^'(no + A*) + ^>>* - no) - tf(N - n + 1)] . 

Although in this case there is no way to derive a simple, yet sufficiently general, analytical expression for the maximum 
no and the variance c 2 , in Fig H we have used the asymptotic series expansion for ip(z) p2| to calculate numerically 
both quantities. Once again, since the lognormal distribution is completely specified by no and a 2 , fixing these fixes 
the entire curve. The figure shows that the lognormal approximation matches the exact solution well for intermediate 
to high immigration regimes. We also note that, in general, the lognormal is a better fit to the exact solution than 
the Gaussian. 

Lognormal and logseries distributions have been used by ecologists to fit real abundance data for 
years OMMflnlpTj. Our results show that it is possible for both distributions to stem from the same general 
ecological process under different immigration regimes. If, for instance, we counted species abundances in a small 
area within a wood, the lognormal distribution would probably arise since that area is no doubt weakly isolated from 
the rest of the wood by external immigration. The same experiment performed in a rather isolated area might be 
expected to give rise to an empirical relative abundance distribution well fitted by a logseries function. 

Finally, in this section we calculate the diversity, H, of the ecosystem (also called the Shannon entropy) for our 
model in the stationary state. It is defined by 

s 
H = -^pilnpt, (37) 

»=i 

where pi is the probability that an individual selected at random from the system belongs to species i jjj. Clearly, 
Pi = Tii/N, where m is the number of individuals of species i in the stationary state. On the other hand, within our 
mean field approach, the quantity which we can calculate is P s {n) — the probability that a typical species will have 
n individuals in the system when it is in the stationary state. Since the number of species with n individuals in the 
system is just SP s (n), we may express ( p7\ ) within our approximation as 

H = - Y^ SP s (n) (n/N) In (n/N) . (38) 

n 

Multiplying (||) by n and summing, it is easy to find that (n) = N/S in the stationary state. Using this result we 
have 

S 
H = - — ^ P s (n)n (Inn -In N) 

71 

= -^{(nlnn)-(n)lnN} 

Q 

= InN (nlnn). (39) 

Therefore we only need to evaluate (nlnn) in the stationary state to find H . Fig g shows the result of performing 
this evaluation using the stationary probability distribution P s (n) , for increasing values of the immigration parameter 
/i. For comparison purposes, direct computation of the average number of species and the average entropy for 1000 
replicas of the model and its standard deviation is shown. It can be seen that for relatively low immigration rates the 
system tends to be saturated admitting as many species as possible. As immigration increases, the Shannon entropy 
grows steeper than the number of species does, meaning that immigration tends to equalize the number of individuals 
of different species first, rather than increase the actual number of species in the system. 

In Figs H and ra it can be seen that ensemble average curves for the expected number of species in the stationary state 
deviate systematically from the mean field approximation that we have implemented through the master equation (0), 
even though they show the same qualitative behaviour. The explanation for this slight disagreement comes from the 
way we are estimating the probability of an effective interaction within the system. When transition probabilities are 
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discussed (Eqns. ([j]) and (g)), the probability of interaction is split into a product of the probability of picking two 
potential interacting individuals multiplied by the probability of having an actual link between the species they belong 
to. However, these two events are not independent. The assumption of independence is just an approximation that 
allows us to simplify the system and get some insight into the dynamical processes that take place during simulations. 
In particular, for any parameter choice, the individual based model has more interactions than expected and cannot 
maintain the same number of species as predicted by our mean field approximation. 
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FIG. 6. The expected value for the number of species and the Shannon entropy as a measure of diversity is computed at 
the steady state using the stationary distribution (h3() for increasing values of immigration parameter. Ensemble averages of 
these two quantities are also shown for increasing discrete values of the same parameter. 1000 replicas of the model have been 
simulated. Values for the number of species and Shannon entropy have been recorded after 500000 simulation time steps, when 
the system has reached the stationary state. 



IV. TIME DEPENDENCE 

What can our model say about the assembly of an ecological community? Whenever species colonize a new island 
or any empty space, a new community builds up from scratch. The process that takes place is called succession by 
ecologists. The assembly of an ecological community has been studied both from the theoretical and empirical point of 
view. Many patterns have been found during the process of ecological succession (see J22| for a review). For instance, 
the number of species grows in a particular way that depends on the immigration from a biogeographical species pool. 
If our model is to make any prediction about succession, simulation time must have a direct meaning in terms of 
physical time. In our model the only connection to physical time comes from the immigration parameter, but the model 
has in fact two time scales. The external one is defined by the flux of individuals from the biogeographical pool and 
the internal one is defined by the flux of individuals (birth-death process) as a consequence of the internal dynamics 
(pairwise random encounters) within the system. Our immigration parameter captures the relative importance of 
these two different temporal processes. 

Therefore, having investigated the properties of the stationary state in the last section, we now move on to the 
study of the time evolution of the system within the mean-field approximation. The master equation (|3j) has transition 
probabilities g n and r n which are non-linear in n, so that an exact solution for the time-dependent behavior is not 
possible. However, since in the problem of interest N is very large, the possibility of performing a large- N analysis 
suggests itself. In this section we will describe the application of such an analysis — specifically van Kampen's large- TV 
method [ fLOf --to our model. This method has a number of attractive features, for instance, the macroscopic (i.e. 
deterministic) equation emerges naturally from the stochastic equation as a leading order effect in N , with the next to 
leading order giving the Gaussian broadening of P(n, t) about this average motion. The method is clearly presented 
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in jlO| , so we will only give a brief outline of the general idea and stress the application to the model of interest in 
this paper. 

If we take the initial condition on (g) to be P(n, 0) = S nym , we would expect, at early times at least, P{n,t) to 
have a sharp peak at some value of n (of order TV), with a width of order N.N^ 1 / 2 = N 1 / 2 . It is therefore, natural 
to transform from the stochastic variable n to the stochastic variable £ by writing 

n = N<f>(t) + iV 1/2 £ (40) 

where <fr(t) is some unknown macroscopic function which will have to be chosen to follow the peak in time. A new 
probability distribution II is defined by P(n,t) — II(£,i), which implies that 

P^-iV 1 / 2 ^— (41) 

dt dt d^ ' [ ' 

The master equation (0) may be written 

P n = {£-!) r n P n + (E- 1 - 1) g n P n , (42) 

where E (E _1 ) is an operator which changes n into n + 1 (n — 1), e.g. if f n is an arbitrary function of n, then 
Ef n = fn+i- In terms of £: 

£±1 - 1±N ~ 1/2 ^ + h N ~ 1 w + -- (43) 

Using (B0)-(M3) the original master equation for P(n,t) can be rewritten as an equation for II(£,£). By rescaling the 

time according to r = t/N, a hierarchy of equations can be derived by identifying terms order by order in powers of 
jy-i/2_ rp^g £ rg ^. twQ f tnese are: 

(44) 

(IT 

and 

r)n fi i fl 2 n 

(45) 
fT Ul; z y^- 

where 



a 2 ,oW>) - 2C*(1 - /i)0(l - 0) + | + | (5 - 2)0. (46) 

The first equation ([II]) is the macroscopic equation for 4>{t). It is easily solved to give 

4>{t) = ^(0)e-^ + | (1 - e~n . (47) 

Initially we ask that £(0) = 0, which means that 0(0) = n(0)/N — m/N. Going back to the t variable gives 

0(i) = ^e-^ + l(l-e-^). (48) 

The second equation (|4^) is a linear Fokker-Planck equation whose coefficients depend on time through <fi given by 
(|48|). It is straightforward to show that the solution to this equation is a Gaussian and so it is only necessary to 
determine (£) r and (6 2 )t to completely characterize IT(£,£). By multiplying ( (45[ ) by £ and £ 2 and using integration 
by parts, one finds jnj 

dr(0r = a[,o(<f>){0r and 

d T {e)r = 2a' liO (0)(£ 2 ) r + o a ,o W . (49) 

In our case d T {£) T = —^{£) T and so 
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since we have already assumed that (£)o — 0. A straightforward, but tedious, calculation now gives 



(50) 



A 



H S 2 

(2 V + /i) (5 - 2) 
/i S 



-Mr [l _ e -Mr] 



2?7^Te 



2^-2^ 



(51) 



where r? = C*(l - m) and A = (m/N) - (1/5). 

We have already commented that the solution of ( f45|) is a Gaussian. Specifically 



P(n,t) 



y/2TrN(p) T 



exp 



(n- Ncj){t)f 



(52) 



where (^ 2 ) r and 4>{t) are given by equations (51) and ( |4^ ) respectively. In Fig [7] and || a comparison between the 
numerical integration of the master equation and the Gaussian solution for different times is shown. The Gaussian 
behavior is lost for large times. In figure @ the Gaussian behavior is maintained longer due to a higher immigration 
rate; as the immigration rates increase still further, the Gaussian form persists for even larger times. 
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FIG. 7. Temporal evolution of the relative abundance distribution P(n,t). The parameter values are N = 1000, S = 50, 
C = 0.4 and fi = 0.001. An individual-based simulation, the numerical integration of P(n,i) at two successive times, and the 
exact stationary solution are also presented (inset). 
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FIG. 8. Temporal evolution of the relative abundance distribution P(n,t). The parameter values are JV = 1000, S = 50, 
C — 0.4 and fj, — 0.01. The temporal evolution of the variance and the average abundance computed using fell) and (Uq) 
respectively and from the master equation is also shown (inset). 

In order to compare the time behavior of the mean field approach introduced in this work through the master 
equation (0) with the time behavior of the individual based model (IBM) defined by the rules presented in section 
I, one should carefully define what is meant by time. Individual based simulations are performed by the iteration 
of an algorithm from the first step up to a given number of updating steps. In section 1, such an updating step 
has been defined as a time unit. Let us call it the simulation time unit. In [pi a different operational choice was 
made. Whatever the convention is, a clear distinction must be made between the simulation time and thephysical 
time needed to compare simulation results with either the numerical integration of the master equation (|3|) or the 
large- TV solution derived in this section. The question then arises: how is physical time to be tracked in any stochastic 
realization of the IBM? To analyze this point we will follow an argument given by Renshaw |23|. At any time t, 
the probability of an event occurring in the system can be estimated. Such a probability depends on the system 
configuration, i.e, the abundance of all present species, and on the relative immigration rate \i in relation to the 
internal dynamics rate \ — \i. Both rates have dimensions of T _1 . Obviously, it depends also on the other parameters 
of the model (N, S and C). Although the method described in Ref |2J| estimates every transition probability rate for 
all possible events in the system, there is no need to estimate the probability of this rather high number of possible 
events. There are only two relevant temporal processes: immigration and internal dynamics. So, it is enough to 
consider these two different possibilities: 



1 . An immigration event / occurs if any species from the pool happens to enter the system. The probability of a 
pool species entering the system in any small dt is: 



Pr{I}=n(t)dt, 



where the immigration rate is 



1G 



f'l 



s 

2 — 1 



nj{t) 

N 



2. An internal dynamics event D occurs when the interaction between a pair of individuals from two potentially 
interacting species gives rise to a change in their abundances. The probability of such an event occurring in any 
small dt can be written as: 



Pr{D} = r D dt , 



where the internal dynamics rate is 



I'D 



(1-/*) 



Ey^ rii(t) rij{t) 
^ N N 



and where Q(i) must be understood as the set of species 
the pre-defined interaction matrix Q. 



different from i — that are connected to i through 



Since the two events defined are independent from each other, the probability of occurrence of any one of them in any 
small dt is: 

Pr{I U D} = (n +r D )dt. 

When an event occurs there is a change in the actual configuration of the system either by immigration or by internal 
dynamics and the rates must be calculated again. So, approximately, on average the number of such effective events 
in any time interval of length t would be (rj + rjj) t, and would be distributed as a Poisson random variable with that 
mean. The important point is that now the probability of having no events in any time interval of length t, i.e, for 
any time between and t, can be written as: 



Pr{0} = 



— p-( r I+ r D)t 



(53) 



According to (p3|), the probability of having at least one event is 1 — e~('' /+rD )* -- the cumulative probability 
distribution for an exponentially distributed random variable. Therefore, the time to the next event is an exponentially 
distributed random variable with expectation l/(rj+ru). Then, we should sample that distribution in order to predict 
when the next effective event will take place. Accumulating these inter-event times during simulations we are able to 
track the physical time, which have the same units as [77 + ru]^ 1 , so the same time units which arise in the master 
equation ( p|) 

In figurelfl, the time evolution of the number of species in the system is shown. Different stochastic realizations 
of the IBM are presented. The numerical integration of the master equation allows the estimation of the expected 
number of species at any time in the system < S > through (pq). The average behavior of the different stochastic 
simulations is well captured by the prediction given by (|2^) where P(0, t) is computed at each numerical integration 
time step. 
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FIG. 9. Time evolution of the number of species in the system. The individual-based simulations match the expected value 
from the master equation. The system starts with no species. During the stochastic assembly process the number of species 
fluctuates, but the average behaviour is captured by the mean field approach represented by the master equation (|3[). 
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In Fig [h] the probability of having a species represented by n individuals at particular early times, P(n,t), has 
been plotted. It has been computed by performing a numerical integration of the master equation (|3|) (dotted line) 
and by means an ensemble average for the individual-based model after 5000 simulation time steps. Two extremely 
different initial conditions have been used. In the first one, there are no species in the system at time 0. Species 
enter the system and either establish themselves in it or not, performing what could be called a stochastic community 
assembly. In the second initial state, all species are represented in approximately equal numbers. Obviously, the 
one-humped distributions are obtained when the initial condition is a random mixture of species, which is represented 
by P(ra,0) = 1 if n = N/S and P(n,t) = if n ^ N/S in the master equation approach. The purely decreasing 
distributions are obtained when the initial state is a completely empty system. The agreement between the mean 
field approach represented by the master equation and the simulations is seen to be reasonable. 
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FIG. 10. Individual-based model and the temporal evolution of P(n, t) . Comparison between the mean field approach (dotted 
line) and a 2000 ensemble average after 5000 simulation time steps for random mixture and empty system initial conditions. 
In the first case, two different numerical integrations have been performed for t — 9946 and t — 10400. In the second case, 
just one numerical integration has been performed until t = 6725. The reason for these different times is that 5000 simulation 
updating steps can represent more or less physical time depending on the initial configuration of the system. 

As it has been shown in Figs [7] and 0, eventually, the probability distribution deviates from a Gaussian. While 
it is true that one could in principle calculate these non-Gaussian effects using van Kampen's approach (by taking 
higher order terms in N^ 1 ' 2 into account), the method gets increasingly cumbersome. Therefore, in the next section, 
we adopt a totally different approach to the calculation of time-dependence, which is able to give information about 
P(n,t) at late times. 

V. GENERATING FUNCTION 

The technique we will use to probe the time-dependence of P(n,t) in this section is based on the solution of the 
differential equation satisfied by the generating function 



JV 



F(z,t) = Y^ p {n,t)z n 



(54) 



n=0 



for our model in the mean-field approximation. Starting from (ph the derivation of this equation proceeds along 
standard lines EqJlIJ to yield 
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BF d 2 F OF 

— = -z{\ - zf — + (1 - z)(a + (3z)— - 7 (1 - z)F (55) 



where we have introduced a new time 

s = Qt , where ( = — _ , (56) 

and where the constants a, (3 and 7 are defined by 

a = u*-l = X*(S- 1) + N-1, 

/3 = A* + 1 - N and 7 = NX* . (57) 

The conditions on F are 

F(l,t) = l and F(z,0) = z m , (58) 

and follow from the normalization condition ^ -P(^, £) = 1 and the initial condition P(n, 0) = 5 n ,m respectively. 

The partial differential equation ( p5| ) is separable: if we write F(z, s) = S(s)$(z), then S(s) — e~ Xs , where A is a 
constant. The equation for $ is then 

z{\ - z) 2 ^r -(l- z )(a + (3z)^- + 7 (1 - z)$ = A$ . (59) 

az z az 

This can be brought into a more standard form by the change of variables 

$ = (1 - z) N <f> and u = . (60) 



The new form of the equation is 



u(l - u)-4 + [c-{a + b+ l)u] -^ - a60 = , (61) 



where 

a + 6=l-A*-iV-i/* 



a6 = A^ (A* + v * - 1) - A and c = 1 - A* - N . (62) 



The reason for making the transformation (pCj) is that <pW) is the standard form for the hypergeometric equation |L2| 
which has the two independent solutions: 



/,(!) 



u a F(a, a— c+ 1, a — fe+ l;w x ) and 



/.( 2 ) _„.-& 



'A 



= iT 6 .F(&,&-c+l,&-a + l;u~ 1 ). (63) 



Now u x = 1 — z and so in terms of the original variables 



$^ = (1 - z) N+a F(a, a-c+l,a-b+l;l-z) and 

$^ 2) = (1 - z) N+h F(b, b-c+l,b-a+l;l-z). (64) 



The general solution to (55) is then 

F(z, t) = Yl { v * *A° + w > $ i 2) } e_AC ' • ( 65 ) 

A 

where {v\} and {w\} are sets of arbitrary constants. 

To determine the arbitrary constants in (pq) , the conditions (pq) have to be implemented. The details are given in 
Appendix B, where it is shown that the required solution is 
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JV 



F(z, t) = ^ ie fc (1 - z t F ( k -N,k + X*,2k + X*S; 1 - z) e -*(*+A'S-i)C* ; ( 66 ) 

fc=0 

where the constants {wk} are determined by 

f.fir*^-^ r(» + A*)r(2fe + A*g) _[(-!)"(?), ifn<™ 

^ l j U-^y r(fc + A*)r(n + fc + A*S) \ , ifn>m. l j 

This equation holds for all allowed values of n (n = 0, 1, . . . , N). We therefore have (N + 1) linear conditions for the 
(N + 1) constants Wk (k = 0, 1, . . . , AT), and so can determine them uniquely. Thus, (|6q ) together with ( |67|) provide 
a complete solution to the partial differential equation (pq). 

Although the solution is not in closed form, it is possible to obtain the Wk for small values of k rather easily. For 
n = 0, (J67]) involves only wq, for n — 1 only wg and iui, and so on. The expressions for the first three constants are 



N 
W = 1 , w\ = — - m . 

(JV-1)(1 + A*) 

W-2 — 



(2 + A*^) 



N 



N(N-l) (1 + A*) mijn - 1) 

25 (1 + A* S) 2 ' l ' 



the result for wo confirming what we already knew. These results are very useful because, as is clear from (pw, the 
large-time behavior of the system is governed by small values of k. In this case, as we will now show, an explicit form 
for P(n, t) can be found. 

To find P(n,t) we have the identify the coefficient of z n in (pq). In Appendix B it is shown that this leads to 



iV 



P(n, t)=J2 w * *(n, fc ) e-^ k + rs - 1 ^ t , (69) 



fc=0 

where 

r— minjiV — k,n} 

(-l) n - r { 

\Tl— v I \ r 



r-max{n-fc,0} 



^ r(r + fc + A*) T(v*-r) T(2k + A* + v* - N) 

T(k + X*) T(k + u*-N) T(k + \*+v*) ' ^ ' 

This result appears to be rather complicated, but fortunately it simplifies in many cases of interest. For instance, 
suppose we wish to find P(0,t): the time-evolution of the probability that there are no individuals of the species 
present in the system. Since ^(0, k) has only one term in the sum (r — 0), 



T(2fc + v* + A* - N) 



«-w£ %t ;;; ;:;,./ 



-k(k+\*S-l)Ct 



k=0 



Y(k + v* - N)T{k + v* + \*) 



- s(U) \ +Wl (S-l)(X*S + N) e 

I .„ g(A*5 + l)(A*5 + 2)(A*5 + 3) e -wa + i)« \ (n] 

2 (S-l)(X*(S-l) + l){X*S + N){X*S + N+l) ' -r-'-J. V ) 

which describes the approach to the stationary state at large times. In Fig [ll] the temporal evolution of the expected 
number of species, estimated as S(t) = S(l — P(0,t)) is shown. The temporal solution provided by this method is 
exact as long as the total number of coefficients can be computed without numerical error. In Fig nfi a truncated, 
approximated solution is compared with the straightforward numerical integration of the master equation. Complete 
agreement is observed at large times. 
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FIG. 11. Temporal evolution ol the expected value for the number of species in the system. A truncated solution provided 
by (J7lj) is shown (dashed-lines). Only the first 20 Wk coefficients have been computed. The agreement with the numerical 
integration of the master equation (bold lines) gets better and better the larger time is. For N = 1000, S — 50, and C = 0.4 
such an agreement is quite good even for early times. 



If n 7^ 0, the large time behavior of P(n, t) still has a relatively simple form: 

P(n, t) = P s (n) + Wl v(n, 1) e - A * s ^* 

+ w 2X (n,2)e-^ x ' s + 1 ^ t + ..., 



(72) 



where x( n i 1) nas on ly two terms, r = n — 1 and r = n (only one if n — or n = N) and x( n i 2) has only three terms, 
r = n — 2,n — 1 and n (fewer if n = 0, 1, N — 1 or N). 

In Fign2, the computation of P(n, t) for n = 1, 2, 3,4 is shown. The solution is approximate because again just the 
first 20 Wk have been considered, although (|6fj) would be an exact solution as long as all of the terms from k — to 
k = N could be summed without numerical error. For practical reasons this is obviously not possible. In particular, 
at early times, the truncation of ( p9[ ) introduces errors in P(n,t). The same is true when n is too large, because the 
sums in (67) and (w3) are too long to be computed without errors and some numerical instabilities arise. 
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FIG. 12. Temporal evolution of probability of having any species represented by 1, 2, 3, and 4 individuals computed directly 
from (r59|) in dotted lines, where again a truncated solution is used (only the first 20 w% are considered), compared with the 
numerical integration of the master equation (bold lines). The initial condition is P(n, 0) = <5 m ,„ where m = 500 in the upper 
plot and m — in the lower one. 

VI. CONCLUSIONS 

In this paper we have analysed a model which has a structure which is rich enough to show many of the underlying 
patterns seen in real ecosystems, but is still sufficiently simple for a variant of the mean field approximation to be 
applied to obtain analytical results. The most straightforward question that can be asked concerns the nature of 
the stationary state. Within the mean field approximation an exact form for this stationary distribution may be 
derived. We found that this exact result reduced to the logseries distribution in the regime of low immigration and 
to the lognormal distribution in the regime of moderate to high immigration. These two distributions have been 
discussed by ecologists for decades as possible forms for the species abundance distributions. Our approach gives a 
clear interpretation to the parameters on which they depend. This fact has practical consequences for conservation 
biology in order to determine the potential richness (S), the global size (N) and the degree of isolation (/i) of a 
community. We have therefore shown how logseries and lognormal distributions can arise as two different limits of a 
single distribution, a distribution which is, moreover, the stationary distribution of a well-motivated ecological model. 
We also found evidence for the hyperbolic relation between the connectivity and the average number of species — the 
so-called C* — S relation. While we were able to derive this result in the low immigration regime, there was a small 
systematic derivation from the mean-field result and the simulation curves. 

While the stationary distribution is of considerable interest, the strength of the approach that we have adopted here 
is that predictions of the time evolution of the system are also possible. An approximation based on the number of 
individuals in the system being large led to the picture of P(n, t) as an approximately Gaussian distribution broadening 
and moving with time. This behavior may persist for quite a long time, especially when the immigration rate is high, 
but eventually the Gaussian form is lost at large times. To explore the approach to the stationary distribution a 
complimentary formalism was required. Such a method was discussed in section V where a formal general solution for 
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the temporal evolution of the probability of having any species represented by n individuals is given. This solution 
P(n,t) is given as a series expansion around the stationary state. In particular, such a solution allows one to predict 
quite well how the number of species in the system increases with time during the stochastic assembly process. 

In summary, we believe that this simple model has illuminated the general mechanisms at work in ecosystems and 
has allowed us to understand the broad features of some of the universal phenomena seen in these systems. We hope 
that the results presented here will motivate further work both in the increasingly sophisticated stochastic modelling 
of ecosystems and in the interpretation of ecological data within a theoretical framework. 
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APPENDIX A: 



In this appendix we will give details of the derivations of the simpler forms of the stationary distribution discussed 
in section III. 

In our analysis, we will frequently make use of the asymptotic form for T(z + a)/T(z) when z > 1 and z»a. Using 
Stirling's approximation for z ^ 1 one has that 



r(z + q) 
T(z) 



=•0+!) 



(z+a-l/2) 



cxp (—a) 



1 + P 1 (z)g)+P 2 (z)g 

where the Pi{z) are power series in 1/z. Therefore, if in addition we impose the condition a <C z, then to a very good 
approximation 



T(z + a) 



(1 + e) z ( 1+£ ) z a e~ a ; z>l; e=-«l 



(Al) 



Note that a need not be small, it simply has to be much less than z. 
Applying (JAfl) to ^" 3 (n, N) one finds (JV > 1 =>• v* > 1) 



f 3 (n,N) = 



T(N + 1 - re) 

r(A + i) 

N + l' 



T(u*-n) 



I>*) 



N + l 



-(N+l-n) 



(i/*-n) 



(A2) 



The term in curly brackets is equal to one, plus corrections which are negligible if n^ nax A*S/N 2 <C 1 and \*S <C N. 
Therefore, under these conditions 



' N\ f X*S 

T,{».X)*[—) «exp-nln(^l + — 

w exp (~n\*S/N) , 



(A3) 



using S > 1 and A*^ < N again. 

To find a simpler form for !Fi(n), we again apply (Al), but with the more stringent condition a <C 1. It then 
becomes T(z + a)/T(z) rs z a , and so for A* <C 1 and n > 1 



1 £(n + A^) X*n x ' A* 

Tl{n) = ^nx^) —rtnT ^r = ~ cxpA lnn - 



(A4) 



2:-! 



If we now ask that A* lnn max <C 1, we have that 

A* 



Fi{ri) w — ; n > 1 ; A* < 

n in n, 



(A5) 



To estimate P s (0) we apply (jAlj) directly to -F 2 (A0 in (|8|). Assuming X* S > 1 



^ 2 (JV) 



r(z/*+A*) 



r([i/* -i\r] + A*) 



r(i/* - n) 



v* - N 



A* 



A* 

1 + — 
v* 



-{v*+\*) 



v* - N 



(i/*-JV+A*)' 



(A6) 



Under the very reasonable assumptions (A*) 2 <C N and A* <C S, the curly brackets approximate very well to unity, 
and so if in addition A*^ <§C N, 



T 2 (N) 



X*S 



N + X*S 



it) w ^) a 



(A7) 



This result may be used to find a useful expression for the average number of species, (S) defined by (|26|). Since 
P s (0) « exp A* ln(A* 5/ AT) , it follows that 



(5) w (1 - cxp A* ln(A*S/A)) 5 « SA* In f ^L) , 

if A* hx{N/SX*) < 1. We now write 

ln(JV/SA*) =ln([l-/i]C*//*) 

= e _1 + In C* , where e" 1 = In ( 

w e _1 expelnC* , 



(A8) 



(A9) 



if e|lnC*|«l. 

Now suppose that we assume that A* <C e and e| lnC*| -C 1. It follows that A*e~ 1 [l + e| lnC*|] <C 1 and therefore 
that A*ln(iV/S'A*) <C 1. Thus this latter condition may be replaced by A* <C e and e|lnC*| <C 1. The immigration 
rate \i is typically much less than one, so e _1 « | In /j, | . Therefore the last condition becomes | lnC*| <C | ln/f|. Putting 
all of this together, we find that 



< s >^( c *)" = (i^):< c * 



-1+6 



(A10) 



if A*5 > 1 (since we have used (|A7|)), A* < e and | lnC*| < | ln/*|. 



APPENDIX B: 



In this appendix we give details of the calculations presented in section V. We begin by showing that is we apply the 
conditions ( pq ) to the general solution ( pq ) of the partial differential equation (55), we obtain ( pq ) with the constants 
{wfc} being determined by (p7|). 

First, let us apply the condition F(l,t) = 1. Now $^ } w (1 - z) w+a and $f } w (1 - z) Ar+b as 2 -* 1. So define 
a = N + a and b = N + b. Then from (|62|), 



b = 



[1- 


- A*5] - 0T 


-A*5] 2 +4A 




2 




[1- 


- A*S] + yfT 


-A*Sf+4A 



(Bl) 
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with ab = —A. Since the constants a and b appear in the differential equation ( |6l| ) symmetrically, we have made a 
choice as to which has the positive and which has the negative square root. From the general theory of the master 
equation pffl, the eigenvalues A are real and non-negative, and so a and b are real. Moreover, if A =/= 0, they are of 



different signs, since their product is negative. With the choice (Bl), a < and b > 0. From these results we deduce 
that $^' diverges and $)/ — ► as z — ► 1. We must therefore take v\ — for all A > 0. When A = 0, b = 1 - X*S, 
which may be negative, and so we also take wq = 0. Finally, 5 = when A = and so this term is the only one which 
is not zero or does not diverge as z — ► 1. Since the A = solution is the stationary solution, the condition F(l,t) = 1 
is automatically satisfied as long as the stationary solution is normalized. Therefore, the application of this condition 
has reduced ( p5| ) to 



F(z,t)=F 8 (z) + J2^( 1 



\N+b 



F(b,b- 



1,6 



1;1 



z) e 



-AC* 



(B2) 



A>0 



where F s (z) = Y^ n P s(n)z n . 

Before proceeding any further, we need to investigate the sum over A more carefully. We know that this sum should 
be over a set of discrete integers: F(z, t) = J2 n -P( n i ^) z ™ > n £ {0; 1, ■ ■ ■ , N}. An analysis of the structure of the 
hypergeometric function in (B2) for large z shows that this will only be so if b is equal to an integer which is zero or 
negative: b = —I. We can understand this condition by recalling JT3] that the function F(a', b',c '; x) is a polynomial of 
degree I (where I is a non-negative integer) in x if a' = —I. Therefore, if b = —I, then F in (B2) must be a polynomial 
of degree I in (1 — z), i.e. (1 — z) N+b F must be a polynomial of degree N + b + I = N in (1 — z), as required. 

From pl| ), a + b = [1 - X*S], SO if b = N + b = (N - I), then h = I - N + [1 - X*S]. Similarly, b - a = 
2N - 21 - [1 - X*S] = y/[l - A*^] 2 +4A. A short calculation then gives X = (N-l)(w- I), where w = N + X*S - 1. 
Since we require A > in the sum in (B2), then I = 0, 1, . . . , N — 1. So, in summary, 



X = (N - 1)(uj - I) 
a = l- N -u ; b 



I 



0,1,... 
c= 1 



,JV- 
-A* 



N, 



where 



ui = N + X*S-1 



Rather than I, it is preferable to use k = N — I to label the time-dependent solutions. Then (B2) becomes 



N 



F(z, t) = F s {z) + Y^ w k (1 - z) k F{k -N,k + X*,2k + X*S; 1 



-fe(fe+A*S-l)C* 



(B3) 
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fc=i 



where we have written Wui) as Wk for convenience. We note that if there were a k — term in the sum, it would 
equal wq> F(—N, A*, A* S; 1 — z) which in turn equals [G3 

' T(v* - N)T(u* + A*) v ' ' ' > 



w 



N 

E 



N~\T(n + X*)T(v*-n)T(X* + u*-N) n 



n) T(A*) T(v*-N) T(A*+z/*) 



(B6) 



which equals wq J2 n Ps{n)z n , using (|12|). Therefore (|B5| ) may be written in the alternative form 

N 

F(z, t) = J2 Wk (1 - ^) fe F(k -N,k + X*,2k + \*S; 1 - z) e ~Hk+^s-i)ct ^ 



(B7) 



fe=0 



where wo = 1. It is also possible to write the function F in (B7) as a Jacobi polynomial of order N — k fl2[| . 

The final step is to apply the initial condition given in fl58Q to (B7) in order to determine the sets of constants 
{wk}- This leads to 



N 



= Y w k (1 - z) k F(k ~N,k + X*,2k + X*S; 1 



(B8) 



fe=0 
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To solve (B8), let us write the hypergeometric function as 



F=Y / w (l - zY where /„ = (-1)" ( N k ) / fc + A *)» , 
^ J V ; ""* V M n J (2k + X*S) n ' 

n— v / 



where the symbol (a)„ means T(a + n)/r(a). Then rearranging the double sum gives 
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N 



£ (1 - *) n 2 w k / n _ fc = £ c„ (1 - z)» , 



(B9) 



(BIO) 



n=0 



fc=0 



n=0 



where c n — X)fc=o ^ fn—k- However, it is straightforward to determine the c„: by expanding (1 — [1 — z]) m in powers 
of (1 — z) we find that c n vanishes for n > m and is proportional to a binomial coefficient for n < m. Using this result 
and writing out f n -k fully yields 



^ fcl J ^n-fcy r(fc + A*)I> + fc + A*S) \ , if 



n < m 
n > m 



(Bll) 



We now turn to the problem of finding P(n,t), which involves identifying the the coefficient of z n in (|B7|). Let us 
begin by considering F(k — N, k + A*, 2fc + A*^; f — z). By following exactly the same steps that lead to QB6|) in the 
k = case, but this time for general fc, we find that this function equals 



r(u*)T(2k + v* + \* -TV) 

T{k + v* -N)T(k + v* + A*) 



,F(fc- N,fc + A*,l-^*;z) 
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Therefore, we may write the solution for F(z, t), eqn. (B7), as 
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F{z,t) = J2 «*(1 " *)* E Pr(k)z r e- k ^ k + x ' s - 1 
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where 



Pr(k) = 
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T(k + A*) r(fc + v* - N) r(fc + A* + z/*) 



To identify powers of z in (B13) we break it down further 
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we have from (B15) 
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(B12) 



(B13) 



(B14) 



(B15) 



(B16) 



(B17) 
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So to find P(n,t) we have to determine x{n, fc), which is the coefficient of z n in (B16). If we denote (—l) l (j) by a\ 
and p r (k) by 6 r , this is like asking: what is the coefficient of z n in 

[a + a ± z + . . . + a&^] [6 + hz + . . . + ftjv-fcS*"*] ? 

The answer is: 

r—m.in{N — k,n} 

/ drt. — rOr 



r— max{n— fc,0} 



i n _ r iy r . 



Going back to the variables relevant to the problem under consideration, 

r—min{N—k,n} 



x(n,k)= Y, (-i)"- r M*0- (B18) 

r— max{n — A;,0} 

The complicated limits on the sum can be relaxed with a suitable interpretation of the binomial coefficient ( L ) . For 
instance, 

/ k \ - n-k+[n-r\) 
[ ! \n-rj (n-r)ir(-fc) ' 

If (n — r) < fc, both the numerator and the denominator on the right-hand-side diverge in such a way that the ratio is 
finite and non-zero and is commonly written as the left-hand-side. If (n — r) > fc, then only the denom inator diverges 



and the right-hand-side is zero. With this understanding, the lower condition on the sum in (Bl£) can simply be 



replaced by r = 0, since there is no contribution if (n — 7') > fc, that is, if r < n — fc. A similar i nterp retation of the 



binomial coefficient in p r (k) can be used to remove the upper condition on the sum. The result (B18), together with 



the definition of p r (k), determine the P(n,t) given by (B17) 
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